Extracting Financial Data from Corporate Filings with the SEC

Python
Machine Learning

Our project is to create an application using machine learning, specifically Naive Bayes, that can parse a text document and identify phrases that include a disclosure of an SEC investigation. The target accuracy of this model of 95% accuracy was achieved in a 85:15 training/testing split. In the application, users can search for companies by their CIK to find a list of SEC filings from the SEC database, which can then be extracted and processed by the machine learning model. The goal of this project is to improve upon the original work flow (script that grabbed any sentence that contained key words like investigation) by optimizing the processing and extraction of results while also providing a more robust solution to text classification.

0 Lifts 

Artifacts

Name Description
SEC Team's Presentation This video shows how we designed, developed this software, and shows some features about this software.   Link