From c9529cf462394a738ce87b2c2aaf0f04c5153f21 Mon Sep 17 00:00:00 2001 From: IasonC Date: Sat, 12 Apr 2025 15:28:28 +0100 Subject: [PATCH] readme update --- README.md | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index e994611..6cf9d7b 100644 --- a/README.md +++ b/README.md @@ -7,12 +7,15 @@ Cross platform system for GUI automation using Machine Vision. This repository i Here're some of the project's best features: * Cellular automaton inspired interactable detection algorithm -* Selenium powered interactable detector +* Automated data collection and labelling pipeline + * Selenium-based collection for computer websites + * Kotlin-based collection for Android apps (see our [AndroidGUICollection](https://github.com/IasonC/AndroidGUICollection) repository) * Crawler and analysis tool for KhanAcademy +* Screen and keyboard recording, analysing, and saving tools +* Machine Vision models to identify clickable GUI areas and changes in GUI state * Backend with MongoDB (you can replace with your api-key ) * Tool for analysing OCR'ed data for page similarity matching -* Voice powered GUI navigation -* Screen and keyboard recording, analysing, and saving tools +* Voice-powered GUI navigation * "Trace" creation tool (as explained in the paper) * "Trace" replication tool (as explained in the paper) * "Action matching" tool (as explained in the paper) @@ -22,6 +25,12 @@ This repository uses Poetry for python package management. See poetry documentat Most of the Features of the project you can access through CLI defined in `./main.py`. Example: `python main.py hello-world` +## 🧠 Machine Vision Models + +In this paper we implement Machine Vision models for Interactable Detection (FCOS object detector) and Screen Similarity detection (Siamese net). These enable our end-to-end features of robust and platform-independent (1) voice-powered GUI navigation and (2) "Trace" replication. + +See directory ```UIUnderstandingModels``` of branch ```Add/ModelTraining``` for the model implementation (PyTorch Lightning) and training. This directory also includes our complete datasets of GUIs across computer websites (KhanAcademy, Wikipedia, Wolfram-Alpha) and Android apps (Spotify). + ## 🛡️ License: This project is licensed under the MIT