Metadata-Version: 2.1 Name: dateparser Version: 1.1.8 Summary: Date parsing library designed to parse dates from HTML pages Home-page: https://github.com/scrapinghub/dateparser Author: Scrapinghub Author-email: opensource@zyte.com License: BSD Project-URL: History, https://dateparser.readthedocs.io/en/latest/history.html Keywords: dateparser Classifier: Development Status :: 5 - Production/Stable Classifier: Intended Audience :: Developers Classifier: License :: OSI Approved :: BSD License Classifier: Natural Language :: English Classifier: Programming Language :: Python :: 3 Classifier: Programming Language :: Python :: 3.7 Classifier: Programming Language :: Python :: 3.8 Classifier: Programming Language :: Python :: 3.9 Classifier: Programming Language :: Python :: 3.10 Classifier: Programming Language :: Python :: 3.11 Classifier: Programming Language :: Python :: Implementation :: CPython Requires-Python: >=3.7 License-File: LICENSE License-File: AUTHORS.rst Requires-Dist: python-dateutil Requires-Dist: pytz Requires-Dist: regex (!=2019.02.19,!=2021.8.27) Requires-Dist: tzlocal Provides-Extra: calendars Requires-Dist: hijri-converter ; extra == 'calendars' Requires-Dist: convertdate ; extra == 'calendars' Provides-Extra: fasttext Requires-Dist: fasttext ; extra == 'fasttext' Provides-Extra: langdetect Requires-Dist: langdetect ; extra == 'langdetect' ========================== Introduction to dateparser ========================== Features ======== * Generic parsing of dates in over 200 language locales plus numerous formats in a language agnostic fashion. * Generic parsing of relative dates like: ``'1 min ago'``, ``'2 weeks ago'``, ``'3 months, 1 week and 1 day ago'``, ``'in 2 days'``, ``'tomorrow'``. * Generic parsing of dates with time zones abbreviations or UTC offsets like: ``'August 14, 2015 EST'``, ``'July 4, 2013 PST'``, ``'21 July 2013 10:15 pm +0500'``. * Date lookup in longer texts. * Support for non-Gregorian calendar systems. See `Supported Calendars`_. * Extensive test coverage. Basic Usage =========== The most straightforward way is to use the `dateparser.parse <#dateparser.parse>`_ function, that wraps around most of the functionality in the module. :noindex: Popular Formats --------------- >>> import dateparser >>> dateparser.parse('12/12/12') datetime.datetime(2012, 12, 12, 0, 0) >>> dateparser.parse('Fri, 12 Dec 2014 10:55:50') datetime.datetime(2014, 12, 12, 10, 55, 50) >>> dateparser.parse('Martes 21 de Octubre de 2014') # Spanish (Tuesday 21 October 2014) datetime.datetime(2014, 10, 21, 0, 0) >>> dateparser.parse('Le 11 Décembre 2014 à 09:00') # French (11 December 2014 at 09:00) datetime.datetime(2014, 12, 11, 9, 0) >>> dateparser.parse('13 января 2015 г. в 13:34') # Russian (13 January 2015 at 13:34) datetime.datetime(2015, 1, 13, 13, 34) >>> dateparser.parse('1 เดือนตุลาคม 2005, 1:00 AM') # Thai (1 October 2005, 1:00 AM) datetime.datetime(2005, 10, 1, 1, 0) This will try to parse a date from the given string, attempting to detect the language each time. You can specify the language(s), if known, using ``languages`` argument. In this case, given languages are used and language detection is skipped: >>> dateparser.parse('2015, Ago 15, 1:08 pm', languages=['pt', 'es']) datetime.datetime(2015, 8, 15, 13, 8) If you know the possible formats of the dates, you can use the ``date_formats`` argument: >>> dateparser.parse('22 Décembre 2010', date_formats=['%d %B %Y']) datetime.datetime(2010, 12, 22, 0, 0) Relative Dates -------------- >>> parse('1 hour ago') datetime.datetime(2015, 5, 31, 23, 0) >>> parse('Il ya 2 heures') # French (2 hours ago) datetime.datetime(2015, 5, 31, 22, 0) >>> parse('1 anno 2 mesi') # Italian (1 year 2 months) datetime.datetime(2014, 4, 1, 0, 0) >>> parse('yaklaşık 23 saat önce') # Turkish (23 hours ago) datetime.datetime(2015, 5, 31, 1, 0) >>> parse('Hace una semana') # Spanish (a week ago) datetime.datetime(2015, 5, 25, 0, 0) >>> parse('2小时前') # Chinese (2 hours ago) datetime.datetime(2015, 5, 31, 22, 0) .. note:: Testing above code might return different values for you depending on your environment's current date and time. .. note:: For `Finnish` language, please specify ``settings={'SKIP_TOKENS': []}`` to correctly parse relative dates. OOTB Language Based Date Order Preference ----------------------------------------- >>> # parsing ambiguous date >>> parse('02-03-2016') # assumes english language, uses MDY date order datetime.datetime(2016, 2, 3, 0, 0) >>> parse('le 02-03-2016') # detects french, uses DMY date order datetime.datetime(2016, 3, 2, 0, 0) .. note:: Ordering is not locale based, that's why do not expect `DMY` order for UK/Australia English. You can specify date order in that case as follows using `settings`: >>> parse('18-12-15 06:00', settings={'DATE_ORDER': 'DMY'}) datetime.datetime(2015, 12, 18, 6, 0) For more on date order, please look at `settings`. Timezone and UTC Offset ----------------------- By default, `dateparser` returns tzaware `datetime` if timezone is present in date string. Otherwise, it returns a naive `datetime` object. >>> parse('January 12, 2012 10:00 PM EST') datetime.datetime(2012, 1, 12, 22, 0, tzinfo=) >>> parse('January 12, 2012 10:00 PM -0500') datetime.datetime(2012, 1, 12, 22, 0, tzinfo=) >>> parse('2 hours ago EST') datetime.datetime(2017, 3, 10, 15, 55, 39, 579667, tzinfo=) >>> parse('2 hours ago -0500') datetime.datetime(2017, 3, 10, 15, 59, 30, 193431, tzinfo=) If date has no timezone name/abbreviation or offset, you can specify it using `TIMEZONE` setting. >>> parse('January 12, 2012 10:00 PM', settings={'TIMEZONE': 'US/Eastern'}) datetime.datetime(2012, 1, 12, 22, 0) >>> parse('January 12, 2012 10:00 PM', settings={'TIMEZONE': '+0500'}) datetime.datetime(2012, 1, 12, 22, 0) ``TIMEZONE`` option may not be useful alone as it only attaches given timezone to resultant ``datetime`` object. But can be useful in cases where you want conversions from and to different timezones or when simply want a tzaware date with given timezone info attached. >>> parse('January 12, 2012 10:00 PM', settings={'TIMEZONE': 'US/Eastern', 'RETURN_AS_TIMEZONE_AWARE': True}) datetime.datetime(2012, 1, 12, 22, 0, tzinfo=) >>> parse('10:00 am', settings={'TIMEZONE': 'EST', 'TO_TIMEZONE': 'EDT'}) datetime.datetime(2016, 9, 25, 11, 0) Some more use cases for conversion of timezones. >>> parse('10:00 am EST', settings={'TO_TIMEZONE': 'EDT'}) # date string has timezone info datetime.datetime(2017, 3, 12, 11, 0, tzinfo=) >>> parse('now EST', settings={'TO_TIMEZONE': 'UTC'}) # relative dates datetime.datetime(2017, 3, 10, 23, 24, 47, 371823, tzinfo=) In case, no timezone is present in date string or defined in `settings`. You can still return tzaware ``datetime``. It is especially useful in case of relative dates when uncertain what timezone is relative base. >>> parse('2 minutes ago', settings={'RETURN_AS_TIMEZONE_AWARE': True}) datetime.datetime(2017, 3, 11, 4, 25, 24, 152670, tzinfo=) In case, you want to compute relative dates in UTC instead of default system's local timezone, you can use `TIMEZONE` setting. >>> parse('4 minutes ago', settings={'TIMEZONE': 'UTC'}) datetime.datetime(2017, 3, 10, 23, 27, 59, 647248, tzinfo=) .. note:: In case, when timezone is present both in string and also specified using `settings`, string is parsed into tzaware representation and then converted to timezone specified in `settings`. >>> parse('10:40 pm PKT', settings={'TIMEZONE': 'UTC'}) datetime.datetime(2017, 3, 12, 17, 40, tzinfo=) >>> parse('20 mins ago EST', settings={'TIMEZONE': 'UTC'}) datetime.datetime(2017, 3, 12, 21, 16, 0, 885091, tzinfo=) For more on timezones, please look at `settings`. Incomplete Dates ---------------- >>> from dateparser import parse >>> parse('December 2015') # default behavior datetime.datetime(2015, 12, 16, 0, 0) >>> parse('December 2015', settings={'PREFER_DAY_OF_MONTH': 'last'}) datetime.datetime(2015, 12, 31, 0, 0) >>> parse('December 2015', settings={'PREFER_DAY_OF_MONTH': 'first'}) datetime.datetime(2015, 12, 1, 0, 0) >>> parse('March') datetime.datetime(2015, 3, 16, 0, 0) >>> parse('March', settings={'PREFER_DATES_FROM': 'future'}) datetime.datetime(2016, 3, 16, 0, 0) >>> # parsing with preference set for 'past' >>> parse('August', settings={'PREFER_DATES_FROM': 'past'}) datetime.datetime(2015, 8, 15, 0, 0) You can also ignore parsing incomplete dates altogether by setting `STRICT_PARSING` flag as follows: >>> parse('December 2015', settings={'STRICT_PARSING': True}) None For more on handling incomplete dates, please look at `settings`. Search for Dates in Longer Chunks of Text ----------------------------------------- .. warning:: Support for searching dates is really limited and needs a lot of improvement, we look forward to community's contribution to get better on that part. See "`contributing`". You can extract dates from longer strings of text. They are returned as list of tuples with text chunk containing the date and parsed datetime object. :noindex: Advanced Usage ============== If you need more control over what is being parser check the `settings` section as well as the `using-datedataparser` section. Dependencies ============ `dateparser` relies on following libraries in some ways: * dateutil_'s module ``relativedelta`` for its freshness parser. * convertdate_ to convert *Jalali* dates to *Gregorian*. * hijri-converter_ to convert *Hijri* dates to *Gregorian*. * tzlocal_ to reliably get local timezone. * ruamel.yaml_ (optional) for operations on language files. .. _dateutil: https://pypi.python.org/pypi/python-dateutil .. _convertdate: https://pypi.python.org/pypi/convertdate .. _hijri-converter: https://pypi.python.org/pypi/hijri-converter .. _tzlocal: https://pypi.python.org/pypi/tzlocal .. _ruamel.yaml: https://pypi.python.org/pypi/ruamel.yaml Supported languages and locales =============================== You can check the supported locales by visiting the "`supported-locales`" section. Supported Calendars =================== Apart from the Georgian calendar, `dateparser` supports the `Persian Jalali calendar` and the `Hijri/Islami calendar` To be able to use them you need to install the `calendar` extra by typing: pip install dateparser[calendars] * Example using the `Persian Jalali calendar`. For more information, refer to `Persian Jalali Calendar `_. >>> from dateparser.calendars.jalali import JalaliCalendar >>> JalaliCalendar('جمعه سی ام اسفند ۱۳۸۷').get_date() DateData(date_obj=datetime.datetime(2009, 3, 20, 0, 0), period='day', locale=None) * Example using the `Hijri/Islamic Calendar`. For more information, refer to `Hijri Calendar `_. >>> from dateparser.calendars.hijri import HijriCalendar >>> HijriCalendar('17-01-1437 هـ 08:30 مساءً').get_date() DateData(date_obj=datetime.datetime(2015, 10, 30, 20, 30), period='day', locale=None) .. note:: `HijriCalendar` only works with Python ≥ 3.6. .. :changelog: History ======= 1.1.8 (2023-03-22) ------------------ Improvements: - Improved date parsing for Chinese (#1148) - Improved date parsing for Czech (#1151) - Reorder language by popularity (#1152) - Fix leak of memory in cache (#1140) - Add support for "\d units later" (#1154) - Move modification in CLDR data to yaml (#1153) - Add support to use timezone via settings to get PREFER_DATES_FROM result (#1155) 1.1.7 (2023-02-02) ------------------ Improvements: - Add an “ago” synonym for Arabic (#1128) - Improved date parsing for Czech (#1131) - Improved date parsing for Indonesian (#1134) 1.1.6 (2023-01-12) ------------------ Improvements: - Fix the bug where Monday is parsed as a month (#1121) - Prevent ReDoS in Spanish sentence splitting regex (#1084) 1.1.5 (2022-12-29) ------------------ Improvements: - Parse short versions of day, month, and year (#1103) - Add a test for “in 1d” (#1104) - Update languages_info (#1107) - Add a workaround for zipimporter not having exec_module before Python 3.10 (#1069) - Stabilize tests at midnight (#1111) - Add a test case for French (#1110) Cleanups: - Remove the requirements-build file (#1113) 1.1.4 (2022-11-21) ------------------ Improvements: - Improved support for languages such as Slovak, Indonesian, Hindi, German and Japanese (#1064, #1094, #986, #1071, #1068) - Recursively create a model home (#996) - Replace regex sub with simple string replace (#1095) - Add Python 3.10, 3.11 support (#1096) - Drop support for Python 3.5, 3.6 versions (#1097) 1.1.3 (2022-11-03) ------------------ New features: - Add support for fractional units (#876) Improvements: - Fix the returned datetime skipping a day with time+timezone input and PREFER_DATES_FROM = 'future' (#1002) - Fix input translatation breaking keep_formatting (#720) - English: support "till date" (#1005) - English: support “after” and “before” in relative dates (#1008) Cleanups: - Reorganize internal data (#1090) - CI updates (#1088) 1.1.2 (2022-10-20) ------------------ Improvements: - Added support for negative timestamp (#1060) - Fixed PytzUsageWarning for Python versions >= 3.6 (#1062) - Added support for dates with dots and spaces (#1028) - Improved support for Ukrainian, Croatian and Russian (#1072, #1074, #1079, #1082, #1073, #1083) - Added support for parsing Unix timestamps consistently regardless of timezones (#954) - Improved tests (#1086) 1.1.1 (2022-03-17) ------------------ Improvements: - Fixed issue with regex library by pinning dependencies to an earlier version (< 2022.3.15, #1046). - Extended support for Russian language dates starting with lowercase (#999). - Allowed to use_given_order for languages too (#997). - Fixed link to settings section (#1018). - Defined UTF-8 encoding for Windows (#998). - Fixed directories creation error in CLI utils (#1022). 1.1.0 (2021-10-04) ------------------ New features: * Support language detection based on ``langdetect``, ``fastText``, or a custom implementation (see #932) * Add support for 'by