Models, code, and papers for "Urmila Shrawankar":
In this paper we propose an easiest approach for facial expression recognition. Here we are using concept of SVM for Expression Classification. Main problem is sub divided in three main modules. First one is Face detection in which we are using skin filter and Face segmentation. We are given more stress on feature Extraction. This method is effective enough for application where fast execution is required. Second, Facial Feature Extraction which is essential part for expression recognition. In this module we used Edge Projection Analysis. Finally extracted features vector is passed towards SVM classifier for Expression Recognition. We are considering six basic Expressions (Anger, Fear, Disgust, Joy, Sadness, and Surprise)
Image Processing, Optimization and Prediction of an Image play a key role in Computer Science. Image processing provides a way to analyze and identify an image .Many areas like medical image processing, Satellite images, natural images and artificial images requires lots of analysis and research on optimization. In Image Optimization and Prediction we are combining the features of Query Optimization, Image Processing and Prediction . Image optimization is used in Pattern analysis, object recognition, in medical Image processing to predict the type of diseases, in satellite images for predicting weather forecast, availability of water or mineral etc. Image Processing, Optimization and analysis is a wide open area for research .Lots of research has been conducted in the area of Image analysis and many techniques are available for image analysis but, a single technique is not yet identified for image analysis and prediction .our research is focused on identifying a global technique for image analysis and Prediction.
This software project based paper is for a vision of the near future in which computer interaction is characterized by natural face-to-face conversations with lifelike characters that speak, emote, and gesture. The first step is speech. The dream of a true virtual reality, a complete human-computer interaction system will not come true unless we try to give some perception to machine and make it perceive the outside world as humans communicate with each other. This software project is under development for listening and replying machine (Computer) through speech. The Speech interface is developed to convert speech input into some parametric form (Speech-to-Text) for further processing and the results, text output to speech synthesis (Text-to-Speech)
Word ambiguity removal is a task of removing ambiguity from a word, i.e. correct sense of word is identified from ambiguous sentences. This paper describes a model that uses Part of Speech tagger and three categories for word sense disambiguation (WSD). Human Computer Interaction is very needful to improve interactions between users and computers. For this, the Supervised and Unsupervised methods are combined. The WSD algorithm is used to find the efficient and accurate sense of a word based on domain information. The accuracy of this work is evaluated with the aim of finding best suitable domain of word.
The main motivation for Automatic Speech Recognition (ASR) is efficient interfaces to computers, and for the interfaces to be natural and truly useful, it should provide coverage for a large group of users. The purpose of these tasks is to further improve man-machine communication. ASR systems exhibit unacceptable degradations in performance when the acoustical environments used for training and testing the system are not the same. The goal of this research is to increase the robustness of the speech recognition systems with respect to changes in the environment. A system can be labeled as environment-independent if the recognition accuracy for a new environment is the same or higher than that obtained when the system is retrained for that environment. Attaining such performance is the dream of the researchers. This paper elaborates some of the difficulties with Automatic Speech Recognition (ASR). These difficulties are classified into Speakers characteristics and environmental conditions, and tried to suggest some techniques to compensate variations in speech signal. This paper focuses on the robustness with respect to speakers variations and changes in the acoustical environment. We discussed several different external factors that change the environment and physiological differences that affect the performance of a speech recognition system followed by techniques that are helpful to design a robust ASR system.
From the existing research it has been observed that many techniques and methodologies are available for performing every step of Automatic Speech Recognition (ASR) system, but the performance (Minimization of Word Error Recognition-WER and Maximization of Word Accuracy Rate- WAR) of the methodology is not dependent on the only technique applied in that method. The research work indicates that, performance mainly depends on the category of the noise, the level of the noise and the variable size of the window, frame, frame overlap etc is considered in the existing methods. The main aim of the work presented in this paper is to use variable size of parameters like window size, frame size and frame overlap percentage to observe the performance of algorithms for various categories of noise with different levels and also train the system for all size of parameters and category of real world noisy environment to improve the performance of the speech recognition system. This paper presents the results of Signal-to-Noise Ratio (SNR) and Accuracy test by applying variable size of parameters. It is observed that, it is really very hard to evaluate test results and decide parameter size for ASR performance improvement for its resultant optimization. Hence, this study further suggests the feasible and optimum parameter size using Fuzzy Inference System (FIS) for enhancing resultant accuracy in adverse real world noisy environmental conditions. This work will be helpful to give discriminative training of ubiquitous ASR system for better Human Computer Interaction (HCI).
Achieving and maintaining the performance of ubiquitous (Automatic Speech Recognition) ASR system is a real challenge. The main objective of this work is to develop a method that will improve and show the consistency in performance of ubiquitous ASR system for real world noisy environment. An adaptive methodology has been developed to achieve an objective with the help of implementing followings, -Cleaning speech signal as much as possible while preserving originality / intangibility using various modified filters and enhancement techniques. -Extracting features from speech signals using various sizes of parameter. -Train the system for ubiquitous environment using multi-environmental adaptation training methods. -Optimize the word recognition rate with appropriate variable size of parameters using fuzzy technique. The consistency in performance is tested using standard noise databases as well as in real world environment. A good improvement is noticed. This work will be helpful to give discriminative training of ubiquitous ASR system for better Human Computer Interaction (HCI) using Speech User Interface (SUI).
Form about four decades human beings have been dreaming of an intelligent machine which can master the natural speech. In its simplest form, this machine should consist of two subsystems, namely automatic speech recognition (ASR) and speech understanding (SU). The goal of ASR is to transcribe natural speech while SU is to understand the meaning of the transcription. Recognizing and understanding a spoken sentence is obviously a knowledge-intensive process, which must take into account all variable information about the speech communication process, from acoustics to semantics and pragmatics. While developing an Automatic Speech Recognition System, it is observed that some adverse conditions degrade the performance of the Speech Recognition System. In this contribution, speech enhancement system is introduced for enhancing speech signals corrupted by additive noise and improving the performance of Automatic Speech Recognizers in noisy conditions. Automatic speech recognition experiments show that replacing noisy speech signals by the corresponding enhanced speech signals leads to an improvement in the recognition accuracies. The amount of improvement varies with the type of the corrupting noise.
The time domain waveform of a speech signal carries all of the auditory information. From the phonological point of view, it little can be said on the basis of the waveform itself. However, past research in mathematics, acoustics, and speech technology have provided many methods for converting data that can be considered as information if interpreted correctly. In order to find some statistically relevant information from incoming data, it is important to have mechanisms for reducing the information of each segment in the audio signal into a relatively small number of parameters, or features. These features should describe each segment in such a characteristic way that other similar segments can be grouped together by comparing their features. There are enormous interesting and exceptional ways to describe the speech signal in terms of parameters. Though, they all have their strengths and weaknesses, we have presented some of the most used methods with their importance.
Speech is a natural form of communication for human beings, and computers with the ability to understand speech and speak with a human voice are expected to contribute to the development of more natural man-machine interfaces. Computers with this kind of ability are gradually becoming a reality, through the evolution of speech recognition technologies. Speech is being an important mode of interaction with computers. In this paper Feature extraction is implemented using well-known Mel-Frequency Cepstral Coefficients (MFCC).Pattern matching is done using Dynamic time warping (DTW) algorithm.
In this age of information technology, information access in a convenient manner has gained importance. Since speech is a primary mode of communication among human beings, it is natural for people to expect to be able to carry out spoken dialogue with computer. Speech recognition system permits ordinary people to speak to the computer to retrieve information. It is desirable to have a human computer dialogue in local language. Hindi being the most widely spoken Language in India is the natural primary human language candidate for human machine interaction. There are five pairs of vowels in Hindi languages; one member is longer than the other one. This paper describes an overview of speech recognition system that includes how speech is produced and the properties and characteristics of Hindi Phoneme.
Automatic speech recognition enables a wide range of current and emerging applications such as automatic transcription, multimedia content analysis, and natural human-computer interfaces. This paper provides a glimpse of the opportunities and challenges that parallelism provides for automatic speech recognition and related application research from the point of view of speech researchers. The increasing parallelism in computing platforms opens three major possibilities for speech recognition systems: improving recognition accuracy in non-ideal, everyday noisy environments; increasing recognition throughput in batch processing of speech data; and reducing recognition latency in realtime usage scenarios. This paper describes technical challenges, approaches taken, and possible directions for future research to guide the design of efficient parallel software and hardware infrastructures.
Acoustical mismatch among training and testing phases degrades outstandingly speech recognition results. This problem has limited the development of real-world nonspecific applications, as testing conditions are highly variant or even unpredictable during the training process. Therefore the background noise has to be removed from the noisy speech signal to increase the signal intelligibility and to reduce the listener fatigue. Enhancement techniques applied, as pre-processing stages; to the systems remarkably improve recognition results. In this paper, a novel approach is used to enhance the perceived quality of the speech signal when the additive noise cannot be directly controlled. Instead of controlling the background noise, we propose to reinforce the speech signal so that it can be heard more clearly in noisy environments. The subjective evaluation shows that the proposed method improves perceptual quality of speech in various noisy environments. As in some cases speaking may be more convenient than typing, even for rapid typists: many mathematical symbols are missing from the keyboard but can be easily spoken and recognized. Therefore, the proposed system can be used in an application designed for mathematical symbol recognition (especially symbols not available on the keyboard) in schools.
This paper proposes a new scheme for performance enhancement of distributed genetic algorithm (DGA). Initial population is divided in two classes i.e. female and male. Simple distance based clustering is used for cluster formation around females. For reclustering self-adaptive K-means is used, which produces well distributed and well separated clusters. The self-adaptive K-means used for reclustering automatically locates initial position of centroids and number of clusters. Four plans of co-evolution are applied on these clusters independently. Clusters evolve separately. Merging of clusters takes place depending on their performance. For experimentation unimodal and multimodal test functions have been used. Test result show that the new scheme of distribution of population has given better performance.