Logo Search packages:      
Sourcecode: jericho-html version File versions  Download package

au::id::jericho::lib::html::Tag Class Reference

Inheritance diagram for au::id::jericho::lib::html::Tag:

au::id::jericho::lib::html::Segment au::id::jericho::lib::html::HTMLElementName au::id::jericho::lib::html::EndTag au::id::jericho::lib::html::StartTag

List of all members.

Detailed Description

Represents either a StartTag or EndTag in a specific Source source} document.

Tag Parsing Process

The following process describes how each tag is identified by the parser:

  1. Every '<' character found in the source document is considered to be the start of a tag. The characters following it are compared with the TagType::getStartDelimiter() start delimiters} of all the TagType::register() registered} TagType tag types}, and a list of matching tag types is determined.
  2. A more detailed analysis of the source is performed according to the features of each matching tag type from the first step, in order of precedence, until a valid tag is able to be constructed.

    The analysis performed in relation to each candidate tag type is a two-stage process:

    1. The position of the tag is checked to determine whether it is TagType::isValidPosition(Source,int) valid}. In theory, a TagType::isServerTag() server tag} is valid in any position, but a non-server tag is not valid inside another non-server tag.

      The TagType#isValidPosition(Source, int pos) method is responsible for this check and has a common default implementation for all tag types (although custom tag types can override it if necessary). Its behaviour differs depending on whether or not a Source::fullSequentialParse() full sequential parse} is peformed. See the documentation of the isValidPosition method for full details.

    2. A final analysis is performed by the TagType#constructTagAt(Source, int pos) method of the candidate tag type. This method returns a valid Tag object if all conditions of the candidate tag type are met, otherwise it returns null and the process continues with the next candidate tag type.
  3. If the source does not match the start delimiter or syntax of any registered tag type, the segment spanning it and the next '>' character is taken to be an isUnregistered() unregistered} tag. Some tag search methods ignore unregistered tags. See the isUnregistered() method for more information.

See the documentation of the TagType class for more details on how tags are recognised.

Tag Search Methods

Methods that find tags in a source document are collectively referred to as Tag Search Methods. They are found mostly in the Source and Segment classes, and can be generally categorised as follows:

Open Search:
These methods search for tags of any getName() name} and getTagType() type}.
Named Search:
These methods usually include a parameter called name which is used to specify the getName() name} of the tag to search for. In some cases named search methods do not require this parameter because the context or name of the method implies the name to search for. In tag search methods specifically looking for start tags, specifying a name that ends in a colon (:) searches for all start tags in the specified XML namespace.
Tag Type Search:
These methods usually include a parameter called tagType which is used to specify the getTagType() type} of the tag to search for. In some methods the search parameter is restricted to the StartTagType subclass of TagType.
Other Search:
A small number of methods do not fall into any of the above categories, such as the methods that search on Source::findNextStartTag(int pos, String attributeName, String value, boolean valueCaseSensitive) attribute values}.

Definition at line 120 of file Tag.java.

Public Member Functions

final char charAt (final int index)
int compareTo (final Object o)
final boolean encloses (final int pos)
final boolean encloses (final Segment segment)
final boolean equals (final Object object)
String extractText (final boolean includeAttributes)
String extractText ()
List findAllCharacterReferences ()
List findAllComments ()
List findAllElements (final StartTagType startTagType)
List findAllElements (String name)
List findAllElements ()
List findAllStartTags (final String attributeName, final String value, final boolean valueCaseSensitive)
List findAllStartTags (String name)
List findAllStartTags ()
List findAllTags (final TagType tagType)
List findAllTags ()
List findFormControls ()
FormFields findFormFields ()
Tag findNextTag ()
Tag findPreviousTag ()
final List findWords ()
final int getBegin ()
List getChildElements ()
String getDebugInfo ()
abstract Element getElement ()
final int getEnd ()
final String getName ()
Segment getNameSegment ()
String getSourceText ()
final String getSourceTextNoWhitespace ()
abstract TagType getTagType ()
Object getUserData ()
int hashCode ()
void ignoreWhenParsing ()
boolean isComment ()
abstract boolean isUnregistered ()
final boolean isWhiteSpace ()
final int length ()
Attributes parseAttributes ()
abstract String regenerateHTML ()
void setUserData (final Object userData)
final CharSequence subSequence (final int beginIndex, final int endIndex)
abstract String tidy ()
String toString ()

Static Public Member Functions

static final boolean isWhiteSpace (final char ch)
static final boolean isXMLName (final CharSequence text)
static final boolean isXMLNameChar (final char ch)
static final boolean isXMLNameStartChar (final char ch)

Static Public Attributes

static final String A = "a"
static final String ABBR = "abbr"
static final String ACRONYM = "acronym"
static final String ADDRESS = "address"
static final String APPLET = "applet"
static final String AREA = "area"
static final String B = "b"
static final String BASE = "base"
static final String BASEFONT = "basefont"
static final String BDO = "bdo"
static final String BIG = "big"
static final String BLOCKQUOTE = "blockquote"
static final String BODY = "body"
static final String BR = "br"
static final String BUTTON = "button"
static final String CAPTION = "caption"
static final String CENTER = "center"
static final String CITE = "cite"
static final String CODE = "code"
static final String COL = "col"
static final String COLGROUP = "colgroup"
static final String DD = "dd"
static final String DEL = "del"
static final String DFN = "dfn"
static final String DIR = "dir"
static final String DIV = "div"
static final String DL = "dl"
static final String DOCTYPE_DECLARATION = StartTagType.DOCTYPE_DECLARATION.getNamePrefixForTagConstant()
static final String DT = "dt"
static final String EM = "em"
static final String FIELDSET = "fieldset"
static final String FONT = "font"
static final String FORM = "form"
static final String FRAME = "frame"
static final String FRAMESET = "frameset"
static final String H1 = "h1"
static final String H2 = "h2"
static final String H3 = "h3"
static final String H4 = "h4"
static final String H5 = "h5"
static final String H6 = "h6"
static final String HEAD = "head"
static final String HR = "hr"
static final String HTML = "html"
static final String I = "i"
static final String IFRAME = "iframe"
static final String IMG = "img"
static final String INPUT = "input"
static final String INS = "ins"
static final String ISINDEX = "isindex"
static final String KBD = "kbd"
static final String LABEL = "label"
static final String LEGEND = "legend"
static final String LI = "li"
static final String LINK = "link"
static final String MAP = "map"
static final String MENU = "menu"
static final String META = "meta"
static final String NOFRAMES = "noframes"
static final String NOSCRIPT = "noscript"
static final String OBJECT = "object"
static final String OL = "ol"
static final String OPTGROUP = "optgroup"
static final String OPTION = "option"
static final String P = "p"
static final String PARAM = "param"
static final String PRE = "pre"
static final String PROCESSING_INSTRUCTION = StartTagType.XML_PROCESSING_INSTRUCTION.getNamePrefixForTagConstant()
static final String Q = "q"
static final String S = "s"
static final String SAMP = "samp"
static final String SCRIPT = "script"
static final String SELECT = "select"
static final String SERVER_COMMON = StartTagType.SERVER_COMMON.getNamePrefixForTagConstant()
static final String SERVER_MASON_COMPONENT_CALL = MasonTagTypes.MASON_COMPONENT_CALL.getNamePrefixForTagConstant()
static final String SERVER_MASON_NAMED_BLOCK = MasonTagTypes.MASON_NAMED_BLOCK.getNamePrefixForTagConstant()
static final String SERVER_PHP = PHPTagTypes.PHP_STANDARD.getNamePrefixForTagConstant()
static final String SMALL = "small"
static final String SPAN = "span"
static final String STRIKE = "strike"
static final String STRONG = "strong"
static final String STYLE = "style"
static final String SUB = "sub"
static final String SUP = "sup"
static final String TABLE = "table"
static final String TBODY = "tbody"
static final String TD = "td"
static final String TEXTAREA = "textarea"
static final String TFOOT = "tfoot"
static final String TH = "th"
static final String THEAD = "thead"
static final String TITLE = "title"
static final String TR = "tr"
static final String TT = "tt"
static final String U = "u"
static final String UL = "ul"
static final String VAR = "var"
static final String XML_DECLARATION = StartTagType.XML_DECLARATION.getNamePrefixForTagConstant()

Package Functions

final boolean includeInSearch ()
 Tag (final Source source, final int begin, final int end, final String name)

Static Package Functions

static final StringBuffer appendCollapseWhiteSpace (final StringBuffer sb, final CharSequence text)
static final Tag findPreviousOrNextTag (final Source source, final int pos, final TagType tagType, final boolean previous)
static final Tag findPreviousOrNextTag (final Source source, final int pos, final boolean previous)
static final Tag findPreviousOrNextTagUncached (final Source source, final int pos, final TagType tagType, final boolean previous, final int breakAtPos)
static final Tag findPreviousOrNextTagUncached (final Source source, final int pos, final boolean previous, final int breakAtPos)
static Iterator getNextTagIterator (final Source source, final int pos)
static final Tag getTagAt (final Source source, final int pos)
static final Tag getTagAtUncached (final Source source, final int pos)
static final Tag[] parseAll (final Source source, final boolean assumeNoNestedTags)

Package Attributes

int allTagsArrayIndex = -1
final int begin
List childElements = null
Element element = Element.NOT_CACHED
final int end
String name = null
final Source source

Static Private Member Functions

static final Tag parseAllFindNextTag (final Source source, final ParseText parseText, final int pos, final boolean assumeNoNestedTags)

Private Attributes

Object userData = null

Static Private Attributes

static final boolean INCLUDE_UNREGISTERED_IN_SEARCH = false


class  NextTagIterator

The documentation for this class was generated from the following file:

Generated by  Doxygen 1.6.0   Back to index