7 Jun 2020 year, ended the first online hackathon contest “Digital breakthrough”. Team digital integrator DD Planet took 1st place in a case analysis of GEODATA from the Association of big data and Sberbank.
Source photo teaser: Jesse Echevarria on Unsplash
This year the hackathon was held in an online format, it was attended by 250 teams from different regions of Russia. The team DD Planet together with the graduates of the TSU has performed under the name “Firmachi” headed by the lead programmer Yuri Besalolum. Within 36 hours of the hackathon participants had to solve cases from leading IT companies, start-UPS and regional offices.
The mission of the Association of big data and Sberbank teams was required to develop an intelligent system for the preprocessing of mail addresses that are not exposed to the decomposition of the existing Normalizer of the Bank. The result of the proposed solutions was to be the algorithm that adjust the address so that it is successfully processed by the Normalizer of the Bank.
Team “Firmachi” presented a solution which allows to structure the addresses, removes unnecessary punctuation, meaningless or hinder the detection units and the address to the standard view. The algorithm is based on identified by statistical analysis of good and bad addresses and allows companies with large database quickly updated. The solution is automatic, which ensures the minimum investment to support it.
Yuri Basalov, lead programmer DD Planet: “during the development of the algorithm, we are faced with a problem – could not check your solution in the Normalizer of the savings Bank. Therefore, our team created the classification model, which with an accuracy of more than 98% of detects detects whether the Normalizer address or not. Through statistical analysis we identified blocks of addresses that “break” the Normalizer. We have deleted or transformed. This approach allowed us to obtain more than 67% of the detected bad addresses with a minimum loss of information. We immediately adapted his solution to high loads by implementing it in Java + Apache Spark”.