Howard's Homepage
Research Interest
UAV for Forest Fire Management
Multi-Rotor Rotorcraft
Intelligent Transportation Systems
Multi-Agent Systems
Object Oriented Programming
Virtual Reality Modelling
Autonomous Mobile Robot
Industrial Overhead Cranes
Parking System Using FPGA
Artificial Neural Networks
Perception Models for Robots
Multiple UAVs and UGVs
Document Classification
Nuclear Inspection
Domestic Water Heater
Autonomous Underwater Vehicle
Motion Planning
My Favourite Links
My Publication Samples
My Research Interest

Howard's Homepage

 

Automatic Document Classification Using Artificial Intelligence

In the information age, with the rapid growth of Internet and electronic document resources, automatic document retrieval systems are becoming more important. Humans can recognize words like "sports" or "politics" by glancing at an article without reading the whole text in order to classify documents. However, when millions of files are involved, we cannot rely on human-beings to go through all the documents. Especially when no keywords are listed for the articles. In the information age, we need computer systems that can complete document classification tasks automatically. For document classification, we need a reliable automatic classifier to tag electronic files for separate categories. Such a classifier can be used at libraries, educational institutions, governments and businesses having large amounts of customer data or documents, and IT companies that have to access huge amounts of data online. In the proposed research project, an intelligent document classifier will be built where the information gain method will be used for feature selection tasks of the classifier. The developed classifier will select words from a document that contain more information to separate this document from others. The weights of the vectors describing the words will be evaluated. If a word is distributed over more documents in the selection, its weight should be lower because it does not contain much information to classify the documents. Artificial intelligence will be used to train the developed classifier. Because of the information contained in the selected words, the learning efficiency of the developed classier will be very high. The classifiers will be tested using a standard database online. Thousands of documents will be selected to train the classifier and five to ten thousand documents will be selected as the test set. The effectiveness of the developed classifier will be demonstrated by comparison studies. Two standard performance measures will be used to evaluate the performance of the classifier.


Home |  My Profile |  Research |  Teaching |  Lab |  Research Associates |  Pictures

This website was constructed using Vi IMproved. Last Update:  December 23, 2007. Howard Li. All Rights Reserved. Fredericton, New Brunswick, Canada.