In the United States, Tax Season descends upon the country every April, requiring millions of Americans to spend hours deciphering cryptic documents and performing complex math just to figure out what they owe. Wouldn’t it be grand if there was a way for a computer to take all the relevant documents and extract out exactly what the IRS is looking for? Lending Document AI from Google Cloud supports common document types used for Income Tax Filing, such as W-2s and 1099s. These advancements in machine learning technology now makes it possible to alleviate some anxiety leading up to April 15th.
Lending Document AI is a Document Understanding solution that allows for classification and parsing of documents commonly used in the mortgage lending industry. The data in these unstructured files is then converted into a structured format, which can be stored in a database or used for analysis and calculations. You can read more about the product in the announcement blog post. For this tax filing use case, we will focus on automatically classifying and parsing the 2020 editions of the following forms:
This sample application creates an automated pipeline where the user can bulk upload a collection of PDFs, the Lending Document Splitter & Classifier will classify each document and send each PDF to the appropriate specialized parser to extract the data, which can then be used to calculate an individual tax return and fill out a 1040 Form.
Let’s explore how this application works. You can check out the sample code in this GitHub Repository.
- The User uploads multiple PDF files to the web application, hosted on Cloud Run.
- An API call is made to the Lending Document Splitter & Classifier for each PDF file.
- The output of the classifier (e.g. W-2, 1099-MISC, etc.) is then mapped to an appropriate specialized parser in the Google Cloud Project.
- Each document file is sent to the appropriate specialized parser that matches the document type.
- The entities are extracted by the parser processor and the data is written to Firestore.
- The raw data is now retrieved from Firestore and displayed to the User showing the file classification and extracted values from each form.
- The data values from all the forms are used together to calculate an individual income tax return.
- The Calculated Tax Rates/Incomes/Deductions are displayed to the User in a Tabular Format matching the IRS Form 1040. The app also displays which form data was used for each field. (Some output fields use values from multiple forms, such as line 25b.)
|NOTE: The Lending Processors in this Demo are in Limited GA as of March 2022. If you have a business use case for these processors, you can fill out and submit the Document AI limited access customer request form.|
1. Clone the GitHub Repository to get the sample code.
git clone https://github.com/GoogleCloudPlatform/document-ai-samples.git
2. Enter the directory for the tax pipeline demo
4. Install the python libraries:
pip install -r requirements.txt
6. Enable the Document AI API:
7. Setup application default credentials:
Deploy demo application
1. Edit the
config.yaml file, adding your own Project Details
docai_processor_location: us # Document AI Processor Location (us OR eu)
docai_project_id: YOUR_PROJECT_ID # Project ID for Document AI Processors
collection: tax_demo_documents # Set with your preferred Firestore Collection Name
project_id: YOUR_PROJECT_ID # Project ID for Firestore Database
2. Run setup scripts to create the processors and Cloud Run app in your project.
gcloud run deploy tax-demo --source .
3. Visit the Deployed Web Page (You should get a link from the deployment command)
4. Upload Documents. I created some sample documents you can download from the sample-docs folder of the repository.
This demo currently supports the following Document Types (2020 Editions)
5. Click “Upload” Button, wait for processing to complete.
- The page will display the steps completed for each document file. These are also written to stdout for troubleshooting purposes.
6. View the extracted values from each file.
7. Click “Calculate Taxes” to see the tax calculation output
Warning: This is NOT financial advice, for educational purposes only.
Congratulations! You now have a fully functional tax processing application that can also be modified for use with other workflows that require data from multiple specialized documents.
The Document AI API is flexible and modular enough that most of the code in this example can be reused for any specialized processor.
Now tax returns can be filed with minimal manual effort!
If you want to learn more about Document AI, check out the Cloud Documentation and these videos:
- Getting started with the Document AI platform
- Process billions of pages and cut operational costs with DocAI
And if you want more hands-on experience, I recommend following these step-by-step codelabs to get started with the key features of Document AI:
- Optical Character Recognition (OCR) with Document AI (Python)
- Form Parsing with Document AI (Python)
- Specialized Processors with Document AI (Python)
- Managing Document AI processors with Python
By: Holt Skinner (Developer Relations Engineer)
Source: Google Cloud Blog