ABSTRACT It develops and implements an Automatic Indexing and Segmentation System for long Spanish texts, contributing to the categorization and automatic textual indexing. For its development, study and improvement of quantitative methods, classic law retrieval and information such as models relating to process repetition of words (Zipf, 1949) (Mandelbrot, 1953) and the vocabulary creation process (Heaps, 1978). It is a critique of the circumstances of the application models and the study of the stability of the experimental parameters by word counts and fragments. It is to establish recommendations that are set to the priority values ??of its parameters, depending on circumstances of application and type of text analyzed. It observes the behaviour of the parameters formulas to discern a direct relationship to the type of text analysis. The new proposed model (log-%) is to visualize the distribution of frequencies of words of text. The ultimate goal is to identify thematic changes that produce a document to establish its structure topic and get the automatic indexing of each of its parts. Thus, we obtain the categorization text or document using a list of its thematic parts level or as a tree structure. Once formed the thematic parts of the text in their levels corresponding to the indexed terms, these blocks are grouped hierarchically and distributed according to the break down of the document in question. The initial block describes the overall content of the entire document with an initial amount of words or descriptors. Next this initial block is subdivided into several blocks, which correspond to different parts of the total document, each of these also contains a number of words describing the content and so on to form the necessary divisions and to reach a description of each paragraph of the document. The terms which will ultimately form part of the thematic map or Automatic Indexing and Segmentation System will be a combination of words obtained from text co-occurrence with words that exceed the appropriate threshold. The terms are automatically placed at each level of segmentation using similarities between them and the Log-% mentioned above. This doctoral thesis not only consists of a conceptual base theoretical indexing and automatic segmentation but implementation and review of the computer applications that provides the basis for experiments of this research.